I’m currently working on an ALU in VHDL.  To get warmed up, I tried my hand at some basic arithmetic, which I’m going to discuss in this blog post.  Here’s the diagram of the ALU:

A and B are the two inputs and F will be the result.  Each input and output is a 4-bit number.  S represents the function selected.  For now I’m going to use the following:

• 2 = Subtract
• 3 = Multiply
• 4 = Divide

I can add new functions as I need them.  I also have the option of expanding the number of bits to work with.  For now, I’ll just keep this simple.

I’m a software developer, so I look at this problem and I think “switch/case statement”.  As it turns out, there is a case statement for VHDL.  Some searching on Bing turns up this website: VHDL-Online.  Which I found to be very clear and easy to read.  I learn quicker from examples, so this site will be my go-to site for looking up VHDL syntax.  As I looked over the syntax examples, I noticed that the case statement doesn’t work without a process block.  I just wrapped my case statement with a generic process block and came up with this block of code:

```entity smallalu is port (
S: in integer range 0 to 15;
A,B: in signed(3 DOWNTO 0);
F: out signed(3 DOWNTO 0)
);
end smallalu;

architecture Behavioral of smallalu is
begin
process (S,A,B)
begin
case S is
when 1 =>
F <= A+B;
when 2 =>
F <= A-B;
when 3 =>
F <= RESIZE(A*B,4);
when 4 =>
F <= A/B;
when others =>

end case;
end process;
end Behavioral;```

You’ll have to include “use IEEE.numeric_std.ALL;” at the top in order to use the math functions (and “signed” data types).  Most of the code is pretty obvious: When the selector is set to “1”, then add the two inputs and assign to the output and so on.  The multiply was a bit of a challenge.  My simulation was showing “U” for the outputs of a multiply.  I did some investigating and discovered (or rather, rediscovered) that multiplying two 4-bit numbers results in an 8-bit number.  At one time, I knew that, but it’s been a while.  So I did some research and discovered the “resize” function that allowed me to take the 8-bit result and resize to a 4-bit result.  The understanding is that I can’t really multiple any more than 2-bits from A with 2-bits from B, otherwise, it’ll overflow.  So I’ll need to figure out a solution to that issue in the future, when I decided to expand the data path width.

There is also an unsigned data type.  If you change your inputs to unsigned, you must also change F to unsigned.  Everything will work correctly (except you’ll be working with positive numbers only).

Next, I wanted to add a reset or clear function.  Technically, it’s just a constant zero output because there are not latches inside this ALU.  This code is pure logic.  Here is what the code looks like after the change:

```entity smallalu is port (
S: in integer range 0 to 15;
A,B: in signed(3 DOWNTO 0);
F: out signed(3 DOWNTO 0)
);
end smallalu;

architecture Behavioral of smallalu is
begin
process (S,A,B)
begin
case S is
when 0 =>
F <= to_signed(0,4);
when 1 =>
F <= A+B;
when 2 =>
F <= A-B;
when 3 =>
F <= RESIZE(A*B,4);
when 4 =>
F <= A/B;
when others =>

end case;
end process;
end Behavioral;```

As you can see, assigning a zero to F is not just a matter of using an assignment.  A constant zero is assumed to be an integer data type.  The “to_signed()” function can be used to convert it into a signed data type.  This function requires the number of bits, so I put in a 4.  The simulation look like this:

The first block up to 10ns is just the clear.  From 10ns to 20ns is “add”, from 20-30 is “subtract”, 30-40 is multiply and finally 40-50 is divide (as you can see I’m dividing 4 by 2).

One last test, I decided to compile this code for the mimas board, just to see what kind of resources it would occupy on my FPGA.  I didn’t map any inputs and outputs, and I didn’t transfer this to the board since I don’t have enough dip switches to represent S, A and B (though I’m sure I could get creative and use the push buttons for “B” inputs or something).  Anyway, here is the result:

```Slice Logic Utilization:
Number of Slice Registers:                     0 out of  11,440    0%
Number of Slice LUTs:                         47 out of   5,720    1%
Number used as logic:                       47 out of   5,720    1%
Number using O6 output only:              38
Number using O5 output only:               0
Number using O5 and O6:                    9
Number used as ROM:                        0
Number used as Memory:                       0 out of   1,440    0%

Slice Logic Distribution:
Number of occupied Slices:                    19 out of   1,430    1%
Number of MUXCYs used:                         8 out of   2,860    1%
Number of LUT Flip Flop pairs used:           47
Number with an unused Flip Flop:            47 out of      47  100%
Number with an unused LUT:                   0 out of      47    0%
Number of fully used LUT-FF pairs:           0 out of      47    0%
Number of slice register sites lost
to control set restrictions:               0 out of  11,440    0%```

As you can see 47 LUTs are used for the logic as well as 19 slices.  This represents about 1% of the chip resources.  Not bad.  I’m betting that a multiplier scales up exponentially.  So an 8-bit alu is going to take up more than double the resources.  Let’s find out…

```Slice Logic Utilization:
Number of Slice Registers:                     0 out of  11,440    0%
Number of Slice LUTs:                        112 out of   5,720    1%
Number used as logic:                      112 out of   5,720    1%
Number using O6 output only:             101
Number using O5 output only:               0
Number using O5 and O6:                   11
Number used as ROM:                        0
Number used as Memory:                       0 out of   1,440    0%

Slice Logic Distribution:
Number of occupied Slices:                    42 out of   1,430    2%
Number of MUXCYs used:                        32 out of   2,860    1%
Number of LUT Flip Flop pairs used:          112
Number with an unused Flip Flop:           112 out of     112  100%
Number with an unused LUT:                   0 out of     112    0%
Number of fully used LUT-FF pairs:           0 out of     112    0%
Number of slice register sites lost
to control set restrictions:               0 out of  11,440    0%```

Hmmm…. Only a little over double (2.38 x).  Time to setup a multiply only and see what resources it takes to multiply two numbers together.  Here’s my basic code:

```entity multiplier is port (
A,B: in signed(3 DOWNTO 0);
Y: out signed(3 DOWNTO 0)
);
end multiplier;

architecture Behavioral of multiplier is

begin
Y <= RESIZE(A*B,4);
end Behavioral;```
```Slice Logic Utilization:
Number of Slice Registers:                     0 out of  11,440    0%
Number of Slice LUTs:                         15 out of   5,720    1%
Number used as logic:                       15 out of   5,720    1%
Number using O6 output only:              10
Number using O5 output only:               0
Number using O5 and O6:                    5
Number used as ROM:                        0
Number used as Memory:                       0 out of   1,440    0%```

That’s 15 LUTs to multiply two 4-bit numbers together.  8-bit numbers:

```Device Utilization Summary:

Slice Logic Utilization:
Number of Slice Registers:                     0 out of  11,440    0%
Number of Slice LUTs:                          0 out of   5,720    0%

Slice Logic Distribution:
Number of occupied Slices:                     0 out of   1,430    0%
Number of MUXCYs used:                         0 out of   2,860    0%
Number of LUT Flip Flop pairs used:            0

IO Utilization:
Number of bonded IOBs:                        24 out of     200   12%

Specific Feature Utilization:
Number of RAMB16BWERs:                         0 out of      32    0%
Number of RAMB8BWERs:                          0 out of      64    0%
Number of BUFIO2/BUFIO2_2CLKs:                 0 out of      32    0%
Number of BUFIO2FB/BUFIO2FB_2CLKs:             0 out of      32    0%
Number of BUFG/BUFGMUXs:                       0 out of      16    0%
Number of DCM/DCM_CLKGENs:                     0 out of       4    0%
Number of ILOGIC2/ISERDES2s:                   0 out of     200    0%
Number of IODELAY2/IODRP2/IODRP2_MCBs:         0 out of     200    0%
Number of OLOGIC2/OSERDES2s:                   0 out of     200    0%
Number of BSCANs:                              0 out of       4    0%
Number of BUFHs:                               0 out of     128    0%
Number of BUFPLLs:                             0 out of       8    0%
Number of BUFPLL_MCBs:                         0 out of       4    0%
Number of DSP48A1s:                            1 out of      16    6%
Number of ICAPs:                               0 out of       1    0%
Number of MCBs:                                0 out of       2    0%
Number of PCILOGICSEs:                         0 out of       2    0%
Number of PLL_ADVs:                            0 out of       2    0%
Number of PMVs:                                0 out of       1    0%
Number of STARTUPs:                            0 out of       1    0%
Number of SUSPEND_SYNCs:                       0 out of       1    0%```

Well, that’s interesting.  Apparently, there are 16 DSP modules and one of those was used for an 8-bit multiplier.  The same results from a 16-bit multiplier.  Let’s push it a little.  Here’s a 32-bit multiplier:

```Specific Feature Utilization:
Number of RAMB16BWERs:                         0 out of      32    0%
Number of RAMB8BWERs:                          0 out of      64    0%
Number of BUFIO2/BUFIO2_2CLKs:                 0 out of      32    0%
Number of BUFIO2FB/BUFIO2FB_2CLKs:             0 out of      32    0%
Number of BUFG/BUFGMUXs:                       0 out of      16    0%
Number of DCM/DCM_CLKGENs:                     0 out of       4    0%
Number of ILOGIC2/ISERDES2s:                   0 out of     200    0%
Number of IODELAY2/IODRP2/IODRP2_MCBs:         0 out of     200    0%
Number of OLOGIC2/OSERDES2s:                   0 out of     200    0%
Number of BSCANs:                              0 out of       4    0%
Number of BUFHs:                               0 out of     128    0%
Number of BUFPLLs:                             0 out of       8    0%
Number of BUFPLL_MCBs:                         0 out of       4    0%
Number of DSP48A1s:                            4 out of      16   25%
Number of ICAPs:                               0 out of       1    0%
Number of MCBs:                                0 out of       2    0%
Number of PCILOGICSEs:                         0 out of       2    0%
Number of PLL_ADVs:                            0 out of       2    0%
Number of PMVs:                                0 out of       1    0%
Number of STARTUPs:                            0 out of       1    0%
Number of SUSPEND_SYNCs:                       0 out of       1    0%```

No LUTs were used, but 4 DSPs were used.  For a 64-bit multiplier:

ERROR:Place:543 – This design does not fit into the number of slices available

Darn!  I had high-hopes.  Oh well.  Now we know a limit to the Spartan-6 XC6SLX9 FPGA chip.

One other arithmetic function available is the modulo (mod).  Which gives the remainder.  Let’s add that to the ALU:

```entity smallalu is port (
S: in integer range 0 to 15;
A,B: in signed(7 DOWNTO 0);
F: out signed(7 DOWNTO 0)
);
end smallalu;

architecture Behavioral of smallalu is
begin
process (S,A,B)
begin
case S is
when 0 =>
F <= to_signed(0,8);
when 1 =>
F <= A+B;
when 2 =>
F <= A-B;
when 3 =>
F <= RESIZE(A*B,8);
when 4 =>
F <= A/B;
when 5 =>
F <= A mod B;
when others =>

end case;
end process;
end Behavioral;```

As you can see from the simulation, 7/2 gives a remainder of 1:

Finally, here’s 8/2:

Today is March 14th, which means that it’s PI day!  So I’m going to talk about computing the number PI.

Years ago I wanted to use my computer to calculate the number PI and I assumed that it was just a matter of getting a formula for PI and just running through a loop that computed each digit for as many digits as I wanted.  Nope.  Not that easy.  The problem is that PI is not computed from left to right, it’s computed with a formula, like everything else in this world and you must create numbers and math functions that work with as many digits as you want to represent PI into.  For instance: If you wanted 1 million digits of PI, you need to be able to handle 1 million digit numbers.  This includes the math functions such as add, multiply, subtract and whatever functions you’ll need to compute PI.

Recently I rediscovered the method of calculating PI and I stumbled across this article: How to calculate 1 million digits of pi.  What’s nice about this article is that it discusses the method of computing PI using C#.  Dot Net includes a large number package called System.Numerics.BigInteger that can be set to use any number of digits and that’s what is used in the article.  I copied the code and compiled it and ran for different sizes of PI which computed in the following times:

5000 digits = 0.05 seconds

10,000 digits = 0.2 seconds

100,000 digits = 17.748 seconds

1,000,000 digits = 31.5 seconds

Next I wanted to know how long it would take to compute PI to 10 million or 100 million digits.  So I plotted all my time estimates onto a graph in Excel and performed a curve fit (using the power curve):

I purposely left out the 1,000,000 digit estimate and I set the “Forecast” to 1,000,000 to see if it came out to 31.5 seconds.  As you can see from the above diagram the number of seconds is about 1900, which is 31.66 minutes. The curve fitting formula is: y=3E-09x^1.9495.

Now, for:

10,000,000 digits = 36 hours

100,000,000 digits = 136 days

1 billion digits = 33 years

Oh what fun!  33 years!  I’m pretty sure I’ll have a faster computer long before that time is up.  I can’t even imagine running a personal computer 136 days straight.  I would have to “enhance” the program so that it can save intermediate values to the hard drive every once in a while so I can switch it off if I needed to, or to recover if the machine crashed or lost power.  Anyway, here’s 1 million digits of PI: pi_one_million_digits

My latest hobby is to learn VHDL and apply to the Mimas V2 FPGA board.  As with any language, reading about the language is all well and good, but attempting a real application is where the rubber hits the road.  I read through several tutorials and some introduction material to familiarize myself with some of the syntax.  I copied some code from example articles and observed how the code worked in simulation mode.  Then I finally decided to try and build a circuit without the code from a tutorial.  Keeping it somewhat simple, I chose to implement a shift register.  My goal was to create a 4-bit shift register that had one input that I could feed a “1” or a “0” per clock cycle.  I also wanted to see all 4 outputs to observe the data bits shifting down the line.  Here’s the circuit:

My first attempt was to just shift the outputs as though they are storage locations.  That caused an error: Cannot read from ‘out’ object output ; use ‘buffer’ or ‘inout’  I discovered that I needed some sort of storage inside my object (the flip-flops that represent my last state).  So I set up a “signal”:

`signal dflipflops: STD_LOGIC_VECTOR(3 downto 0):="0000";`

You can give your signal any name, I just called it dflipflops because that’s what popped into my head.  As you can see, the data in the signal can be pre-set to some value (in quotes because it’s a vector).

Next, I coded the reset.  I didn’t really need a reset for the simulation since I set the dflipflops to all zeros when the object is initialized.  However, if I decided to use this as a real circuit, I’d have to have a way to reset this at any time.  So I coded my reset as simple as possible:

```if (reset = '1') then
dflipflops <= "0000";
else
-- shift logic goes here
end if;```

Next, I hard-coded the logic of shifting bits (this is just the inner logic):

```if (clock='1' and clock'event) then
dflipflops(3) <= dflipflops(2);
dflipflops(2) <= dflipflops(1);
dflipflops(1) <= dflipflops(0);
dflipflops(0) <= datain;
end if;```

I did this because I wanted to see if the simulation worked, and it didn’t.  I ended up with unknown outputs:

Yup, forgot to translate my signal back out to the outputs:

```output(0) <= dflipflops(0);
output(1) <= dflipflops(1);
output(2) <= dflipflops(2);
output(3) <= dflipflops(3);```

That worked:

Next, I converted to “for” loops and here’s the final code:

```library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

-- 4-bit shift register
entity shiftregister is port (
datain,clock,reset: in STD_LOGIC;
output: out STD_LOGIC_VECTOR(3 DOWNTO 0)
);
end shiftregister;

architecture Behavioral of shiftregister is

signal dflipflops: STD_LOGIC_VECTOR(3 downto 0):="0000";

begin
process (datain,clock,reset)
begin
if (reset = '1') then
dflipflops <= "0000";
else
if (clock='1' and clock'event) then
for i in 2 downto 0 loop
dflipflops(i+1) <= dflipflops(i);
end loop;
dflipflops(0) <= datain;
end if;
end if;

for k in 3 downto 0 loop
output(k) <= dflipflops(k);
end loop;

end process;
end Behavioral;```

The test bench code is here:

```ENTITY shiftregistertest IS
END shiftregistertest;

ARCHITECTURE behavior OF shiftregistertest IS

-- Component Declaration for the Unit Under Test (UUT)

COMPONENT shiftregister
PORT(
datain : IN  std_logic;
clock : IN  std_logic;
reset : IN std_logic;
output : OUT  std_logic_vector(3 downto 0)
);
END COMPONENT;

--Inputs
signal datain : std_logic := '0';
signal clock : std_logic := '0';
signal reset : std_logic := '1';

--Outputs
signal output : std_logic_vector(3 downto 0);

-- Clock period definitions
constant clock_period : time := 10 ns;

BEGIN

-- Instantiate the Unit Under Test (UUT)
uut: shiftregister PORT MAP (
datain => datain,
clock => clock,
reset => reset,
output => output
);

-- Clock process definitions
clock_process :process
begin
clock <= '0';
wait for clock_period/2;
clock <= '1';
wait for clock_period/2;
end process;

-- Stimulus process
stim_proc: process
begin
-- hold reset state for 100 ns.
wait for 100 ns;

reset <= '0';

wait for clock_period*6;

-- test 1
datain <= '1';
wait for clock_period;

datain <= '0';
wait for clock_period*6;

-- test 2
datain <= '1';
wait for clock_period*2;

datain <= '0';
wait for clock_period*6;

wait;
end process;

END;```

For the test bench I first defaulted the reset to a “1” to force a reset at the beginning.  Then I set the reset back to “0” before testing data inputs.  The first test (test 1) feeds a “1” into the datain and then shifts it one clock cycle, then sets the datain back to “0”.  Then I shift 6 times to make the “1” shift all the way out of the shift register.  The next test (test 2), I set the datain to a “1” and shifted it in for two clock cycles, causing two “1”s to be inputted into the shift register.  Then I set datain back to “0” and shifted for 6 clock cycles to watch the two bits shift all the way through the shift register.  Here’s the simulation output:

The first test starts at 170ns and ends around 210ns.  The second test starts at 240ns and ends at 290ns.

One thing I noticed about the editor is that you must select the test source file before double-clicking on “Simulate Behavioral Model”:

Otherwise, you’ll get a result like this:

You also need to close the ISim window before you can run another simulation otherwise, you’ll get an error like:

ERROR:Simulator:904 – Unable to remove previous simulation file isim/shiftregistertest_isim_beh.exe.sim/shiftregistertest_isim_beh.exe. Please check if you have another instance of this simulation running on your system, terminate it and then recompile your design. System Error Message: boost::filesystem::remove: Access is denied: “isim\shiftregistertest_isim_beh.exe.sim\shiftregistertest_isim_beh.exe”ERROR:Simulator:861 – Failed to link the design

Once this error occurs, you’ll need to close the ISim window, then you will need to right-click on the “Simluate Behavioral Model” and select “rerun all”.  Double-clicking just gives this error:

INFO:ProjectMgmt – The selected process was not run because a prior process failed.

One other thing I find annoying about the editor is that there is no file name change capability.  I’ve attempted to change the name of a file and ended up with a mess.  There is a lot of smart linking that goes on between the project and the files that belong to it.  My quick fix is to create a new file with the new name and scrape the code from the old source and paste into the new source.  Then I delete the old file.  It’s dumb and dirty, but it’s also pretty quick.

Other than the few quirks that I’ve worked around, I am happy that the editor is similar to Visual Studio in commands and syntax highlighting.

This is a continuation of my FPGA series.  I’m going to show how to create a one-bit full adder circuit using a schematic diagram in Xilinx HDL language.  Then I’m going to send the compiled code to the Mimas V2 Spartan-6 board and test it.  This project is for the entry-level FPGA programmer.  If you just purchased the Mimas V2 board and you’re unsure how to get started, this short tutorial should help with some of the entry-level basics.

To start this project, you’ll need to download and install the Xilinx Design Suite by clicking here.  Follow the instructions, you’ll need to register and answer some questions and install the software.  Go to “All Programs -> Xilinx Design Tools -> ISE Design Suite 14.7 -> ISE Design Tools -> 64-Bit Project Navigator”, then click on the “New Project” button.  If you’ve already installed the tools and wish to start a new project, you’ll need to go to the File menu and close the project before starting a new project.

When the new project dialog starts, select “Schematic” as your top level project (I keep my FPGA projects in a directory on my D: drive):

Then give your project a name and click on the next button.  Then click next again.

You’ll need to create a schematic. Right click on the FPGA board designation node and select New Source:

Click on “Schematic” and Next, Next.

Now you can draw a schematic.  First you’ll need to find logic gates that you want to use.  If you click on the “Symbols” tab (bottom tab of the palette window):

You can find logic gates like and2 which is the 2-input AND gate.  Draw a full adder circuit.  The basic logic for a full adder is like this:

You can plot the XOR2, OR2 and AND2 gates first, then use the wire tool to connect outputs to inputs.  Next you’ll need to add connectors to the A, B, Cin, S and Cout points.  Your final circuit should look something like this:

I added three VCC connections to the enable pins of the LED displays since I won’t be using them. I also had to invert the outputs because the LEDs on the Mimas board are active low.  You can use an XNOR gate instead of an XOR and inverter and you can use a NOR gate instead of the OR and inverter.  I added inverters after I generated the code, sent it to the board and discovered that the LEDs lit opposite of what I expected.

The connectors (A, B, CARRY_IN, SUM, CARRY_OUT, EN1, EN2 and EN3 in the diagram above) can be added from the palette to indicate inputs and outputs.  You’ll need to give them a useful name so you can designate a physical pin to connect them to.  Double-click on the connector object and you’ll see this screen:

Now click on the name under the “Nets” branch and then change the “Name” to the value that you want.  You can also change the connector to be an input or output.

To define the connectors to a physical pin, you’ll need to add a ucf file to your project.  Right-click on the FPGA node again (under the Design tab) and add a New Source.  This time select an Implementation Constraints File:

Then add the following definitions to connect the circuit to your FPGA board:

```NET "SUM" LOC = P15;
NET "CARRY_OUT" LOC = P16;

NET "A" PULLUP;
NET "B" PULLUP;
NET "CARRY_IN" PULLUP;
NET "EN1" PULLUP;
NET "EN2" PULLUP;
NET "EN3" PULLUP;

NET "A" LOC = F17;
NET "B" LOC = F18;
NET "CARRY_IN" LOC = E16;

NET "EN1" LOC = B3;
NET "EN2" LOC = A2;
NET "EN3" LOC = B2;```

You can lookup the physical connection names for the MIMAS V2 board from this website: Mimas V2 Spartan 6 FPGA Development Board With DDR SDRAM.

Next, you’ll need to select your FPGA board node and then in the processes section, you can right-click on “Generate Programming File” and select “Process Properties”.  Make sure the “Create Binary file” check box is checked.  Then right-click “Generate Programming File” again and select “Rerun All”.  That will compile and generate all the files you’ll need for your board.

You’ll need to select the COM port that corresponds to the USB port that you are using for your board.  To find that, you can open the Windows “Device Manager” and look it up.  You’ll need to plug in your Mimas board for this to show up.  You’ll see an entry in the “Ports (COM & LPT)” section:

You can see the “Numato Lab Mimas…” line and it is using COM3.  So select the COM port that your computer is setup for in the Configuration Tool.

Next, click on the “Open File” button and navigate to your project folder and click on the “.bin” file.

Then Click the “Program” button and wait for the “Done” status.

Now you can use the dip switches to turn on/off the A, B and Carry in signals:

This is a view with A, B and Carry In set to a “1”.  The LEDs represent D1=Sum and D2=Carry.  As you can see, they are both lit.

After finishing the beginner’s guide for the Mimas V2 Development Board, I decided to try my hand at a circuit that I previously built with the GAL16V8 chip.  I already had the PDS file for the BCD to Hexadecimal LED display, so I decided to try that.  First, I needed to figure out what I could use for inputs.  The Mimas board has an 8-position DIP switch.  I used switch 1 through 4 for the BCD input.  I also used push button switch number 1 for the lamp test input.  If you go to this website, you can find all the schematics that show inputs and outputs.  Here are the inputs:

According to the diagram above, I chose to use M18 for the lamp test and dip switches F17, F18, E16, E18 for my BCD inputs.  All switches are active low, so I needed pull-up resistors and had to reverse the input logic to the module (a zero is inputted when the switch is ON).

Next, the outputs.  There are three 7-segment displays, so I chose display 1.  To enable the display, I applied a zero to the enable transistor.  Also, I had to disable the other two displays by applying a “1”.  Here’s the schematic:

So B3 must be set to “0” and A2 and B2 must be set to a “1”.  Then segment A is A3, segment B is B4, segment C is A4, etc.  The UCF file for the program will look like this:

```# User Constraint File for BCD to 7-Segment Hex Display implementation on Mimas V2

NET "A" LOC = A3;
NET "B" LOC = B4;
NET "C" LOC = A4;
NET "D" LOC = C4;
NET "E" LOC = C5;
NET "F" LOC = D6;
NET "G" LOC = C6;
NET "EN1" LOC = B3;
NET "EN2" LOC = A2;
NET "EN3" LOC = B2;

# Switches.
# Internal pull-ups need to be enabled since
# there is no pull-up resistor available on board
NET "D0" PULLUP;
NET "D1" PULLUP;
NET "D2" PULLUP;
NET "D3" PULLUP;
NET "LT" PULLUP;

NET "D0" LOC = F17;
NET "D1" LOC = F18;
NET "D2" LOC = E16;
NET "D3" LOC = E18;
NET "LT" LOC = M18;```

Now, it’s time to create the logic.  I discovered the AND, OR, NOR, etc. circuits from this tutorial: Verilog Tutorial.  So I created OR/AND combinations for each segment like this:

```and (a1, D0, D2);
and (a2, D0, !D3);
and (a3, !D1, !D2);
and (a4, !D1, D2, D3);
and (a5, !D0, !D2, D3);
and (a6, D1, D2, !D3);
nor (A, a1, a2, a3, a4, a5, a6, !LT);```

This came from the following formula used in my GAL16V8 circuit:

`/A = /RBO*/D0*/D2 + /RBO*/D0*D3 + /RBO*D1*D2 + /RBO*D1*/D2*/D3 + /RBO*D0*D2*/D3 + /RBO*/D1*/D2*D3 + LT`

I did not implement the RBO (Ripple Blanking Output).  So, I removed all the /RBO terms.  Then I flipped all the “NOT” to be positive and all the positive to be “NOT”.  Instead of !D0 AND !D2, the first term is D0 AND D2.  Last, I ORd the lamp test (LT) with the results.  The LT button is also inverted because pressing the button produces a zero.  So you’ll notice that there is a !LT in each OR term. Since the segments are active low, I had to change all of my OR’s to NOR (the display looked rather interesting with the segments inverted).

The last part of the logic was forcing which display was on:

assign EN1 = 0; // enable digit 1
assign EN2 = 1; // disable digit 2
assign EN3 = 1; // disable digit 3

I wasn’t sure how large each Spartan 6 logic block was, so I just typed this in and expected the program to give me an error if I exceeded the number of AND/OR gates available.  Apparently, I didn’t reach that limit.  The program console shows that my program used 4 LUTs out of 5,720:

```Slice Logic Utilization:
Number of Slice Registers:                     0 out of  11,440    0%
Number of Slice LUTs:                          4 out of   5,720    1%
Number used as logic:                        4 out of   5,720    1%
Number using O6 output only:               1
Number using O5 output only:               0
Number using O5 and O6:                    3
Number used as ROM:                        0
Number used as Memory:                       0 out of   1,440    0%```

Apparently, the LUT (Look Up Table) is used to represent combinatorial circuits.  There is a description of the LUT in this document: Spartan-6 FPGA Configurable Logic Block User Guide

The function generators in Spartan-6 FPGAs are implemented as six-input look-up tables (LUTs). There are six independent inputs (A inputs – A1 to A6) and two independent outputs (O5 and O6) for each of the four function generators in a slice (A, B, C, and D). The function generators can implement any arbitrarily defined six-input Boolean function.
Basically, the LUT is a logic table representation of my discrete boolean logic.  Technically, I could have provided a lookup table definition, similar to mapping out a circuit in Read Only Memory.
The Code
Here’s the full listing of the UCF file:
```# User Constraint File for BCD to 7-Segment Hex Display implementation on Mimas V2

NET "A" LOC = A3;
NET "B" LOC = B4;
NET "C" LOC = A4;
NET "D" LOC = C4;
NET "E" LOC = C5;
NET "F" LOC = D6;
NET "G" LOC = C6;
NET "EN1" LOC = B3;
NET "EN2" LOC = A2;
NET "EN3" LOC = B2;

# Switches.
# Internal pull-ups need to be enabled since
# there is no pull-up resistor available on board
NET "D0" PULLUP;
NET "D1" PULLUP;
NET "D2" PULLUP;
NET "D3" PULLUP;
NET "LT" PULLUP;

NET "D0" LOC = F17;
NET "D1" LOC = F18;
NET "D2" LOC = E16;
NET "D3" LOC = E18;
NET "LT" LOC = M18;```
And the V file:
```module HexLED(D0,D1,D2,D3,LT, A, B, C, D, E, F, G, EN1, EN2, EN3);
input wire D0;
input wire D1;
input wire D2;
input wire D3;
input wire LT;

output wire A;
output wire B;
output wire C;
output wire D;
output wire E;
output wire F;
output wire G;
output wire EN1;
output wire EN2;
output wire EN3;

// /D0*/D2 + /D0*D3 + D1*D2 + D1*/D2*/D3 + D0*D2*/D3 + /D1*/D2*D3
and (a1, D0, D2);
and (a2, D0, !D3);
and (a3, !D1, !D2);
and (a4, !D1, D2, D3);
and (a5, !D0, !D2, D3);
and (a6, D1, D2, !D3);
nor (A, a1, a2, a3, a4, a5, a6, !LT);

// /D2*/D3 + /D0*/D2 + /D0*/D1*/D3 + D0*D1*/D3 + D0*/D1*D3
and (b1, D2, D3);
and (b2, D0, D2);
and (b3, D0, D1, D3);
and (b4, !D0, !D1, D3);
and (b5, !D0, D1, !D3);
nor (B, b1, b2, b3, b4, b5, !LT);

// D0*/D1 + D0*/D2 + /D1*/D2 + D2*/D3 + /D2*D3
and (c1, !D0, D1);
and (c2, !D0, D2);
and (c3, D1, D2);
and (c4, !D2, D3);
and (c5, D2, !D3);
nor (C, c1, c2, c3, c4, c5, !LT);

// /D0*/D1*D3 + /D0*/D2*/D3 + D0*D1*/D2 + /D0*D1*D2 + D0*/D1*D2
and (d1, D0, D1, !D3);
and (d2, D0, D2, D3);
and (d3, !D0, !D1, D2);
and (d4, D0, !D1, !D2);
and (d5, !D0, D1, !D2);
nor (D, d1, d2, d3, d4, d5, !LT);

// /D0*/D2 + D2*D3 + /D0*D1 + D1*D3
and (e1, D0, D2);
and (e2, !D2, !D3);
and (e3, D0, !D1);
and (e4, !D1, !D3);
nor (E, e1, e2, e3, e4, !LT);

// /D0*/D1 + /D2*D3 + D1*D3 + /D0*D2 + /D1*D2*/D3
and (f1, D0, D1);
and (f2, D2, !D3);
and (f3, !D1, !D3);
and (f4, D0, !D2);
and (f5, D1, !D2, D3);
nor (F, f1, f2, f3, f4, f5, !LT);

// D1*/D2 + D0*D3 + /D2*D3 + /D0*D1 + /D1*D2*/D3
and (g1, !D1, D2);
and (g2, !D0, !D3);
and (g3, D2, !D3);
and (g4, D0, !D1);
and (g5, D1, !D2, D3);
nor (G, g1, g2, g3, g4, g5, !LT);

assign EN1 = 0; // enable digit 1
assign EN2 = 1; // disable digit 2
assign EN3 = 1; // disable digit 3

endmodule```

I recently purchased an FPGA development board.  Specifically, the Mimas V2 Spartan 6 board.  There’s a really good introduction to this board which you can access by clicking here.  The board looks like this:

You can connect a micro-USB to USB cable to this board and program it from your PC.  Plus, the power from the USB can power the board for small circuits (there’s a power adapter that did not come with the board that you can use if the power requirements become too high).  I spent an evening going through the basic “hello world” circuit in the beginners guide and I was able to create a circuit that lit one of the small LEDs when pushing a button (I mentioned that this was the “hello world” circuit right?).  What motivated me to buy this board?

Programmable Logic – A Short History

Way back in the late 80’s (or 1978 according to Wiki) the PAL was invented.  This is a Programmable Array Logic device that is similar to the GALs that I used in my previous posts.  The basic chip came with an array of fuses that connected inputs to AND gates and you can blow fuses to “program” which input pins activated which AND gates.  Later, Generic Array Logic devices replaced these devices by substituting fuses with erasable links (electrically erasable floating gates).  These devices can be reprogrammed hundreds of times.  PALs and GALs can be used to replace several TTL logic chips for each programmable chip.

As chip sizes shrunk and more transistors could be fit on a chip new devices were invented.  CPLDs or Complex Programmable Logic Devices were nothing more than a dozen or more PALs on a chip with data buses that can be programmed to connect your sub-circuits (or modules).  The PAL-like structures on CPLDs are referred to as Macro Cells or Generic Logic Blocks.  Here’s an example of a logic block:

These blocks are wired together from a Global Routing Pool.  For the CPLD I used here, there are 16 GLBs connected with one large routing pool.

Along the same lines as the CPLD is the FPGA or Field Programmable Gate Array.  These devices are more complex than the CPLD and they are designed to be field programmable, usually incorporating flash memory.  According to Wiki, the earliest FPGAs were produced in the late 1970s.  FPGAs can contain logic blocks, memory blocks, shift registers, multiplexers and other circuits.  An entire system can be created on a single chip.  The Spartan 6 model XC6SLX9 model, used in the Mimas board I mentioned earlier, has 9,152 logic blocks and 576K of memory.  One of the largest Spartan FPGAs available today has 147,443 logic blocks and 4Meg of memory.

Using Verilog HDL

The circuitry inside an FPGA is too complex to be designed using fuse maps and schematics.  To make the job of creating a complex circuit easier, the Spartan FPGA is programmed using the Verilog Hardware Description Language (HDL).  The Verilog HDL language has “C”-like syntax and the Xilinx design tools editor is similar to Visual Studio.  This website has a pretty good tutorial on Verilog’s HDL language: ASIC World: Verilog.  I would highly recommend following the Numato Lab beginners guide to get used to the tools and the FPGA board.

The Mimas V2 Spartan-6 Board and Chip

When working on a circuit for the Mimas board this site is useful for looking up inputs and outputs: Mimas V2 Spartan 6 FPGA Development Board with DDR SDRAM.  All of the Spartan-6 chip specifications are located at the Xilinx site here: Xilinx Spartan-6 Documentation.  As I mentioned earlier the XC6SLX9 chip is used on the Mimas board.  The board schematics can be found here: Mimas V2 Schematics.

If you dig through the specifications for the Spartan FPGA, you’ll discover the CLB (Configurable Logic Block) organization which starts with this diagram:

As mentioned earlier, there are over 9000 of these blocks, of which, 4 are shown in the diagram above.  Each block contains a slice.  Each slice can be of three different circuit types: SLICEX, SLICEL and SLICEM.  Each with progressively more circuitry.  This table lists the features:

Here’s the diagram for the SLICEM:

You can refer to the documentation here to see the diagrams for the SLICEX and SLICEL.  In the SLICEM above, the 4 boxes on the left are the LUTs or Look Up Tables.  These can be used for combinatorial circuits on their own, or they can be programmed to connect any of the logic in the diagram into a complex subsystem.

Although the documentation lists the XC6SLX9 as having 9,152 logic cells, there are only 1,430 slices, which means that there are 715 CLBs.  The remaining logic cells must be circuitry other than the CLBs.  Further into the documentation is the diagram that shows how all the CLBs are connected together into an array:

Obviously, you would program modules that would be represented by a CLB and each module is interconnected together according to your defined inputs and outputs.  This simplifies the ability to design a large circuit by defining smaller sub-circuits, one module at a time.

So far, I’m only scratching the surface of what this device can do.  All I can think of is that this one chip can represent circuits beyond all the TTL chips I have stored in my box of parts (and I have quite the large collection of chips).   The price for the Mimas board is only \$50 (I purchased it from here).  The Xilinx software used in the beginners guide is free, though you’ll have to fill out some information to download it (go here).

I’ll be following up with more posts on this subject as I learn what this board can do.