Commit Graph

242 Commits

Author SHA1 Message Date
Jordan Niethe dd4d4ea0ad virtio-serial: Do not close stdout on quiesce
Commit 76fee95 ("slof: Only close stdout for virtio-serial devices")
says that commit cf28264 ("virtio-serial: Rework shutdown sequence")
fixed a hang. The problem was believed to be that it was necessary to
close stdout to shutdown the underlying virtio device.

Commit cf28264 ("virtio-serial: Rework shutdown sequence") closed stdout
on quiesce. This meant when prom_init() called write on stdout after
quiesce, there is a use after free so this is unreliable, and can also
hang (especially after reboots).

Quiescing is intended to put hardware into a safe state for the client
to take over. It is incorrect for SLOF to close ihandles that the client
could still be using, even after a quiesce.

Rather than closing the stdout device, all that needs to happen is to
ensure virtio-serial-shutdown gets called. On quiesce, close the virtio
device, but leave the stdout device itself open.

Commit 8174acd ("virtio-serial: Close device completely") handles reads
and writes as no-ops if the underlying virtio device is closed so there
is no problem with the client calling "write" on stdout after this, but
no output will be displayed.

Fixes: cf28264 ("virtio-serial: Rework shutdown sequence")
Debugged-by: Kautuk Consul <kconsul@linux.vnet.ibm.com>
Co-developed-by: Kautuk Consul <kconsul@linux.vnet.ibm.com>
Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com>
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2023-09-18 18:20:45 +10:00
Kautuk Consul 63b66a5147 virtio-serial: Make read and write methods report failure
The read and write methods return successfully even if the virtio device
is closed (virtiodev is 0) and it is not able to send or receive any
characters.

Make the read and write methods return 0 to indicate they did not
succeed in this case.

This also fixes an invalid stack access in the read method.

Fixes: 8174acd ("virtio-serial: Close device completely")
Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com>
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2023-09-18 18:20:45 +10:00
Thomas Huth 9bbdd35a27 Fix typos in the board-qemu folder
Found with the "codespell" utility.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2023-02-28 15:23:05 +11:00
Jordan Niethe b3f699c06a OF: Add a separate direct kernel loading word
Currently, go-64 is used for booting a kernel from qemu (i.e. -kernel).
However, there is an expectation from users that this should be able to
boot not just vmlinux kernels but things like Zimages too.

The bootwrapper of a BE zImage is a 32-bit ELF. Attempting to load that
with go-64 means that it will be ran with MSR_SF set (64-bit mode). This
crashes early in boot (usually due to what should be 32-bit operations
being done with 64-bit registers eventually leading to an incorrect
address being generated and branched to).

Note that our 64-bit payloads are prepared to enter with MSR_SF cleared
and set it themselves very early.

Add a new word named go-direct that will execute any simple payload
in-place and will enter with MSR_SF cleared. This allows booting a BE
zImage from qemu with -machine kernel-addr=0.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2022-07-19 12:54:08 +10:00
Stefan Berger 6c0fcd9f30 tpm: Add firmware API call 2HASH-EXT-LOG
Add a new firmware API call with the name 2HASH-EXT-LOG that will be used
by trusted grub for measuring, logging, and extending TPM PCRs.

Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2021-07-11 23:32:28 +10:00
Thomas Huth 1768e27885 Fix bad header guard in version.h
The #define in version.h does not match the #ifndef in the line before
due to a typo in the suffix ("_F" instead of "_H"). Fix it, and while
we're at it, also remove the underscore at the beginning to avoid that
we're using an identifier here that is reserved by the C standard.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2021-06-15 16:27:41 +10:00
Thomas Huth b7ea243afd virtio-serial: Remove superfluous serial-* words
These likely were a blind copy-n-paste from hvterm.fs, but they
simply do not make any sense in virtio-serial.fs. The hvterm.fs is
always included from OF.fs, so the serial-* words are globally there.
virtio-serial.fs is only used within the virtio-serial device tree
nodes, to adding the serial-* words there is just superfluous.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2021-01-27 21:24:17 +11:00
Alexey Kardashevskiy 8f21e1eb81 fdt: Avoid recursion when traversing tree
A loop over peers does not need recursion which becomes a problem with
hundreds devices.

This was discovered with "-smp 2048,cores=512,threads=4".

Suggested-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-07-17 11:05:52 +10:00
Gustavo Romero 6184ca06c8 board-qemu: Fix comment about SLOF start address
On QEMU pseries (and alike environments) the PC starts at 0x100, hence SLOF
starts at address 0x100, not at 0x0 as the current comment informs. After
that fix the comment also matches the comment above it about the __start
load position, which is correct.

Signed-off-by: Gustavo Romero <gromero@linux.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-06-24 10:30:28 +10:00
Greg Kurz 76fee95890 slof: Only close stdout for virtio-serial devices
Recent commit cf28264196 fixed an issue where a virtio-serial device
wouldn't shutdown properly during quiesce. The fix is to close stdout
just before quiesce. As expected this causes some messages to not
appear anymore, like the well known ones from prom_init():

Quiescing Open Firmware ...
Booting Linux via __start() @ 0x0000000002000000 ...

Actually all messages are discarded until the OS driver finally takes
control of the device, which may represent a fair amount of logging.
This is suboptimal but this still better than hanging in SLOF.

The hammer is a bit too big though because the change also affects
spapr-vty based consoles, which have no reason to stop working
after quiesce.

Move the hack from the common code to the virtio-serial code so that
it doesn't affect other device types anymore. Register a quiesce hook
that closes stdout in virtio-serial.fs.

While here, as suggested by Segher, bring back some robustness in the
shutdown method.

Reported-by: Fabiano Rosas <farosas@linux.ibm.com>
Fixes: cf28264196 "virtio-serial: Rework shutdown sequence"
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-03-27 11:55:00 +11:00
Alexey Kardashevskiy 51245a15fe rtas: Move FWNMI log space reservation to QEMU
This reverts commit 674d0d0cf6 ("rtas: Reserve space for FWNMI log")
which expanded the RTAS blob size to match the QEMU expectation about
the RTAS area available for FWNMI logs.

Instead, it relies on QEMU passing the "rtas-size" property and passes it
through untouched. This adds a check that QEMU allocated enough for
RTAS blob. This adds a fallback to the default 20 bytes "rtas-size" if
none specified by QEMU.

While we are here, replace 's" /rtas" find-node' with 'rtas-node' which
we just set above.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-03-17 09:42:50 +11:00
Alexey Kardashevskiy 8174acd8c8 virtio-serial: Close device completely
Linux closes stdout at the end of prom_init which triggers the FW quiesce
code which closes the virtio-serial instance. This misses stopping the
virtio queues. However this seemed working for a little longer (until the
Linux driver took over) till 300384f3dc which moved the VQ descriptors
around which caused use-after-free corruption.

This adds virtio_queue_term_vq(), cleanup in the forth driver and a few
checks.

Fixes: 300384f3dc ("virtio: Store queue descriptors in virtio_device")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[groug: - fix changelog
        - don't restore emit]
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-03-11 15:43:22 +11:00
Greg Kurz cf28264196 virtio-serial: Rework shutdown sequence
The "io" word of term-io.fs opens two separate instances of the device
for stdin and stdout. The prom_init() function in Linux closes stdin at
some point, which internally calls quiesce and shuts the device down
through a quiesce hook.

When the "open-count" variable in virtio-serial.fs reaches 0, ie. when
closing the last instance, we call "close" two times, which is clearly
wrong. This never hits however because the stdout instance is never
closed which prevents "open-count" to reach 0.

It would make more sense to shutdown the device when closing the last
instance, for symmetry with the first open that initializes the device.
Change the shutdown sequence to do that rather than relying on a quiesce
hook.

Have quiesce to explicitly close stdout, which is supposedly the last
instance, and shutdown the device.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-03-11 15:43:22 +11:00
Greg Kurz 1641d2d5eb virtio-serial: Don't override some words
term-io.fs already overrides "emit", "key" and "key?" with its own version:

- "term-io-emit" calls the "write" method of the "stdout" instance, which
  then calls "virtio-serial-putchar"

- "term-io-key" calls the "read" method of the "stdout" instance, which then
  calls "virtio-serial-getchar"

- "term-io-key?" calls "serial-key?" if the device is a serial device, which
   is the case here and we already override "serial-key?" with
   "virtio-serial-term-key?".

It thus looks weird to rely on these shortcuts. Especially, when IOMMU is
enabled, we need a valid instance in "dmap-map-in" and going through
"term-io-emit" buys us that.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-03-11 15:43:22 +11:00
Alexey Kardashevskiy 4b73a933c4 llfw: Fix debug printf warnings
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-03-11 15:43:22 +11:00
Stefan Berger 16a1867425 tcgbios: Measure the GPT table
Measure and log the GPT table including LBA1 and all GPT table entries
with a non-zero Type GUID.

We follow the specification "TCG PC Client Platform Firmware Profile
Specification" for the format of what needs to be logged and measured.
See section "Event Logging" subsection "Measuring UEFI Variables" for
the UEFI_GPT_DATA structure.

Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-02-21 14:03:07 +11:00
Stefan Berger 8a6b0d7061 tcgbios: Implement menu to clear TPM 2 and activate its PCR banks
Implement a TPM 2 menu and enable the user to clear the TPM
and its activate PCR banks.

The main TPM menu is activated by pressing the 't' key during
firmware startup.

Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Kevin O'Connor <kevin@koconnor.net>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-02-21 14:03:07 +11:00
Stefan Berger ae2e38c3ad tcgbios: Add TPM 2.0 support and firmware API
This patch adds TPM 2.0 support along with the firmware API that Linux
uses to transfer the firmware log.

The firmware API follows the "PFW Virtual TPM Driver" specification.
The API has callers in existing Linux code (prom_init.c) from TPM 1.2
times but the API also works for TPM 2.0 without modifications.

The TPM 2.0 support logs PCR extensions of measurements of code and data.
For this part we follow the TCG specification "TCG PC Client
Platform Firmware Profile Specification" (section "Event Logging").

Other relevant specs for the construction of TPM commands are:
- Trusted Platform Module Library; Part 2 Structures
- Trusted Platform Module Library; Part 3 Commands

Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Kevin O'Connor <kevin@koconnor.net>
[aik: removed new blank lines at EOF]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-02-21 14:03:07 +11:00
Stefan Berger a2ffcd9d65 tpm: Add TPM CRQ driver implementation
This patch adds a TPM driver for the CRQ interface as used by
the QEMU PAPR implementation.

Also add a Readme that explains the benefits and installation procedure
for the vTPM.

Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-02-21 14:03:07 +11:00
Stefan Berger 4f18bac8b0 qemu: Make print_version variable accessible
Make the print_version global variable accessible so that
we can measure the firmware version.

Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-02-21 14:03:07 +11:00
Greg Kurz efa56b851f fdt: Delete nodes of devices removed between boot and CAS
We recently fixed node creation at CAS in order to support early hotplug
of devices between boot and CAS. Let's handle node removal now to support
early hot *un*plug of devices.

This is achieved by associating a generation number to each FDT received
from QEMU and tagging all nodes with this number in a "slof,from-fdt"
property. The generation number is kept in the fdt-generation# variable.
It starts at 0 for the initial boot time FDT, and it is incremented at
each subsequent CAS. All boot time nodes hence get "slof,from-fdt" == 0,
all nodes present at CAS get "slof,from-fdt" == 1 and so on in case the
guest calls CAS again. If a device gets hot unplugged before quiesce, we
hence can detect it doesn't have the right generation number and thus
delete the node from the DT. Note that this only affects nodes coming
from the FDT. Nodes created by SLOF don't have the "slof,from-fdt"
property, and therefore cannot be candidates to deletion.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-02-21 14:03:07 +11:00
Greg Kurz b7e579c856 fdt: Fix update of "interrupt-controller" node at CAS
Now that QEMU passes a full FDT at CAS without rebooting, a guest that
has switched from XICS to XIVE ends up being presented an malformed
"interrupt-controller" node in the DT:

# dtc -I fs -O dts /proc/device-tree
<stdout>: Warning (unit_address_vs_reg): /interrupt-controller: node has a reg or ranges property, but no unit name
...
        interrupt-controller {
                ibm,xive-eq-sizes = <0x10>;
                device_type = "power-ivpe";
                ibm,interrupt-server-ranges = <0x00 0x03>;
                compatible = "ibm,power-ivpe";
                #interrupt-cells = <0x02>;
                reg = <0x60302 0x31b0000 0x00 0x10000 0x60302 0x31a0000 0x00 0x10000>;
                phandle = <0xe7448a8>;
                ibm,xive-lisn-ranges = <0x00 0x03>;
                interrupt-controller;
        };

The node should have its unit set to "60302031b0000" as reported by dtc.
Also the node still has an "ibm,interrupt-server-ranges" property which
only makes sense with XICS.

This happens because we find an existing "interrupt-controller" node,
which describes a XICS controller, and we _wrongly_ decide to copy
all the properties from the new node into it. Delete the existing node
instead so that we create a new node with the appropriate properties
and unit name.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-02-21 14:03:07 +11:00
Alexey Kardashevskiy dea21476b1 fdt: Fix creating new nodes at H_CAS
So far we only allowed new ibm,dynamic-reconfiguration-memory and memory
nodes in the FDT update blob at ibm,client-architecture-support (CAS).
DRC do not have unit addresses and are easy, for memory nodes we use
an address from the node name.

For early hot plugged PCI devices (plugged after reset but before CAS)
we have to have a similar hack as for memory@ but parse the address
differently because of different binding.

Instead, this changes new nodes creation. At pass#0 when we copy phandles
from the FDT update blob to SLOF, we create new nodes with all
new properties and call "finish-device" only after all properties are
copied to the new nodes. At this point we particularly care about "reg"
as this is the unit address which SLOF parses for us and sets the unit
address in "finish-device"; we could skip other properties for later
passes.

Note this creates naked nodes with no methods normally added to the nodes
as this bypasses normal discovery which SLOF performs at start. So
if pass#1 does not find the node created in pass#0, this points to
missing "decode-unit" at the new node parent (happens when adding bridge-
under-bridge) and this prints a message and resets.

While at this, fix few trailing spaces and comments.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[groug: - use fdt-reg-unit to set the unit name
        - consolidate finish-device and unit name for nodes and subnodes
          with a new fdt-cas-finish-device word ]
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-02-21 14:03:07 +11:00
Alexey Kardashevskiy 497f61400d fdt: Fix updating the tree at H_CAS
The previous approach to merge the QEMU FDT into the existing tree and
then patch it turned to be broken as we patch properties based on their
names only so we patch not just what QEMU provides (which was
the intention) but also all properties SLOF created. This breaks one of
them - "interrupt-map" - it is created by QEMU for a PHB but SLOF creates
it for PCI bridges and since they have different sizes, patching phandles
at fixed offset fails.

Rather than skipping certain nodes in the SLOF tree, this uses different
approach: now we read the QEMU FDT in 3 passes:
1. find all phandle/linux-phandle properties and store these in the SLOF
internal tree to allow phandle->node lookup later;
2. walk through all FDT properties, patch them if needed using
phandles from the SLOF tree and save patched values in SLOF properties;
3. delete phandle/linux-phandle properties created in 1. This is safe
as SLOF does not create these properties anyway.

Fixes: 44d06f9e68 ("fdt: Update phandles after H_CAS")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2019-12-05 15:18:54 +11:00
Alexey Kardashevskiy c50195e007 ibm,client-architecture-support: Fix stack handling
fdt-fix-cas-node returns the end address after it's finished which
the caller (ibm,client-architecture-support) does not use or drop.
This renames fdt-fix-cas-node to (fdt-fix-cas-node) and adds a wrapper
on top of that which does the drop. This will be used later for 2-pass
tree patching.

While at this, exit the function if memory allocation failed.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2019-12-05 15:18:54 +11:00
Michael Roth 48b86c575f dma: Define default dma methods for using by client/package instances
They call parent node (which is a device) methods.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2019-12-05 14:41:07 +11:00
Alexey Kardashevskiy ac4c07a9a4 pci-phb: Reimplement dma-map-in/out
The immediate problem with the code is that it relies on memory allocator
aligning addresses to the size. This is true for SLOF but not for GRUB
and in unaligned situations we end up mapping more pages than bm-alloc
allocated.

This fixes the problem by calculating aligned DMA size before calling
bm-alloc.

While at this, simplify the code by removing global variables. Also
replace 1000/fff (the default 4K IOMMU page size) with tce-ps/mask.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
Changes:
v4:
* fixed code comments, tab/spaces
* fixed bm-alloc failure handling
2019-12-05 14:40:52 +11:00
Alexey Kardashevskiy 44d06f9e68 fdt: Update phandles after H_CAS
At the moment SLOF generates phandles except a few exceptions such as
an interrupt controller (XICS/XIVE) and NVLink-related nodes. For these
nodes QEMU generates phandles which SLOF later detects and replaces with
the node addresses (which are phandles in SLOF).

However we are missing these updates when processing
the ibm,client-architecture-support client interface call - SLOF calls
QEMU with H_CAS to get an update for the device tree, and if that blob
contains phandles, they make it to the final tree unchanged with
undefined results.

This calls fdt-fix-phandles for the H_CAS update blob.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2019-08-27 11:50:26 +10:00
Alexey Kardashevskiy 5e4ed1fd0f rtas: Integrate RTAS blob
We implement RTAS as a simple binary blob which calls directly into QEMU
via a custom hcall. So far we were relying on QEMU putting the RTAS blob
to the guest memory with its location in linux,rtas-base/rtas-size.

The problems with this are:
1. we need to peek a location in the guest ram in addition to slof, FDT
and sometime kernel and init ram disk; having one less image makes QEMU's
life easier.
2. for secure VMs, it is yet another image which needs to be signed and
verified.

This implements "instantiate-rtas" completely in SLOF, including KVM PR
support ("broken sc1").

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2019-07-18 16:36:03 +10:00
Thomas Huth 8ae76e0f11 vio-vscsi: Support multiple channels / buses
The spapr-vscsi device of QEMU supports multiple channels (a.k.a. buses).
But when QEMU is started with a device on a bus > 0, SLOF fails to detect
the device, so that the boot fails. For example:

 qemu-system-ppc64 -nodefaults  -nographic -serial stdio -device spapr-vscsi \
  -blockdev driver=file,filename=/path/to/cdrom.iso,node-name=d1,read-only=on \
  -device scsi-cd,id=cd1,drive=d1,channel=6,scsi-id=5,lun=1

Thus SLOF should scan the various channels for bootable SCSI devices, too.
Since the common SLOF code for scanning SCSI devices has no meaning of
"channels" or "bus", we simply fake the bus ID to be part of the target
ID, so instead of supporting 64 targets = 64 devices, we now support
8 channel * 64 targets = 512 devices instead.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1663160
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2019-01-10 18:21:05 +11:00
Thomas Huth cad96808d1 board-qemu/slof/vio-vscsi: Scan up to 64 SCSI IDs
QEMU supports up the 64 SCSI IDs on the vscsi "bus", see the string
"max_target = 63" in the source file hw/scsi/spapr_vscsi.c of QEMU.
However, SLOF currently only checks the first 9 IDs on the vscsi adaptor,
so when you try to boot from a CD-ROM like this, the boot fails:

 qemu-system-ppc64 ... -device spapr-vscsi,id=scsi0,reg=0x2000 \
    -drive file=/path/to/cdrom.iso,format=raw,if=none,id=dr1,readonly=on \
    -device scsi-cd,bus=scsi0.0,channel=0,scsi-id=63,lun=1,drive=dr1,id=scd1

Thus let's change the amount of IDs that we scan in SLOF to 64, too, to
match the ID range that QEMU provides.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2018-12-18 16:01:54 +11:00
Alexey Kardashevskiy 0198ba7759 fdt: Fix phandles for NVLink/NVLink2
The NVIDIA driver for NVLink2-capable GPU NVIDIA V100 discovers topology
between GPU/NPUs/GPURAM via the device tree which needs to have cross
references between device tree nodes.

This adds patching of the nodes needed for the driver to initialize.
As all these properties only contain phandles and nothing else, there is
no risc of accidendal damage.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
* added the commit log
2018-09-07 13:18:50 +10:00
Alexey Kardashevskiy b752674383 fdt: Factor out code to replace a phandle in place
We generate a fake XICS phandle in QEMU and SLOF replaces that phandle
with the real one (i.e. SLOF's node address) in interrupt-parent and
interrupt-map properties. These properties are handled differently -
the interrupt-map is fixed in place while interrupt-parent is
decoded+encoded+set as a property.

This changes interrupt-parent fixing code to do what the interrupt-map
code does because soon we are going to have more phandles to fix and some
contain an array of phandles (such as "ibm,npu").

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v2:
* removed fdt-replace-l,
2018-09-07 13:18:50 +10:00
Thomas Huth 203f6686dc Fix bad assembler statements for compiling with gcc 8.1 / as 2.30
When compiling with a very recent toolchain, I get these warnings:

../../llfw/boot_abort.S: Assembler messages:
../../llfw/boot_abort.S:76: Warning: invalid register expression

and:

stage2_head.S: Assembler messages:
stage2_head.S:57: Warning: invalid register expression

The first one is using the wrong opcode, we should use "and" instead of
"andi" here. The second one is using a register instead of a constant
for load-immediate, which is non-sense, too. Fix it to use the right
constant instead.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2018-07-02 14:16:45 +10:00
Nikunj A Dadhania 8128b8e3ea OF: Use new property "stdout-path" for boot console
Linux kernel commit 2a9d832cc9aae21ea827520fef635b6c49a06c6d
(of: Add bindings for chosen node, stdout-path) deprecated chosen property
"linux,stdout-path" and "stdout".

Check for new property "stdout-path" first and as a fallback check
"linux,stdout-path". This older property can be deprecated after 5 years.

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2018-03-05 14:55:59 +11:00
Alexey Kardashevskiy f8d999c7d9 rtas: Store RTAS address and entry in the device tree
At the moment we count on the guest kernel to update or create device
tree properties pointing to the instantiated RTAS copy which is not
very reliable.

This stores rtas-base and rtas-size in the DT at the instantiation
point so later on the H_UPDATE_DT hcall can supply QEMU with an updated
location of RTAS.

This superseeds f9a60de30 "Add private HCALL to inform updated
RTAS base and entry".

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v5:
* ditched rtas-entry, added rtas-size (which is always 20 bytes though)
2017-11-06 13:28:49 +11:00
Alexey Kardashevskiy 608e416bb1 board-qemu: Fix slof-build-id length
The existing code hardcodes the length of /openprom/model to 10 characters
even though it is less than that - len("aik")==3. All 10 chars go to
the device tree blob and DTC complains on such a property as there are
characters after terminating null:

aik@fstn1-p1:~$ dtc -f -I dtb -O dts -o dbg.dts dbg.dtb
Warning (model_is_string): "model" property in /openprom is not a string

This uses the real length and limits it by 10 to avoid breaking something.

Since the same code parses the build id field, this moves from-cstring
to a common place for both js2x and qemu boards.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2017-11-06 13:28:49 +11:00
Alexey Kardashevskiy e6fc84652c fdt: Pass the resulting device tree to QEMU
This creates flatten device tree and passes it to QEMU via a custom
hypercall right before jumping to RTAS.

This preloads strings with 40 property names from CPU and PCI device nodes
and the strings lookup only searches within these.

Test results on a guest with 256 CPUs and 256 virtual Intel E1000 devices
running on a POWER8 box:
FDTsize=366024 Strings=15888 Struct=350080 Reused str=12457 242 ms

A simple guest (one CPU, no PCI) with this patch as is:
FDTsize=15940 Strings=3148 Struct=12736 Reused str=84 7 ms

While we are here, fix the version handling in fdt-init. It only matters
a little for the fdt-debug==1 case though.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v6:
* fix memory sizes for free-mem
* store correct chosen-cpu to the header (used to be just 0)
* fdt-skip-string uses zcount now and works 30% faster
* moved to a new file - fdt-fl.fs

v5:
* applied latest comments from Segher
* s/fdt-property/fdt-copy-property/, s/fdt-properties/fdt-copy-properties/
* reduced the temporary buffers to 1MB each as the guest uses 1MB in total
anyway
* do not pass root phandle to fdt-flatten-tree, it fetches it from
device-tree itself
* reworked fdt-copy-properties to use for-all-words proposed by Segher

v4:
* reworked fdt-properties, works lot faster
* do not store "name" properties as nodes have names already

v3:
* fixed stack handling after hcall returned
* fixed format versions in both rendering and parsing paths
* rebased on top of removed unused hvcalls
* renamed used variables to have fdtfl- prefixes as there are already
some for parsing the initial dt

v2:
* fixed comments from review
* added strings cache
* changed last_compat_vers from 0x17 to 0x16 as suggested by dwg

---

I tested the blob by storing it from QEMU to a file and decompiling it.
2017-11-06 13:28:49 +11:00
Alexey Kardashevskiy 7b61ea3e5c fdt: Fix version and add a word for FDT header size
This fixes the version handling in fdt-init. It only matters
a little for the fdt-debug==1 case though.

This defines /fdth word for the FDT header size; this will be used
in the next patch.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
2017-11-06 13:28:33 +11:00
Alexey Kardashevskiy 14a876d38d tree: Rework set-chosen-cpu and store /chosen ihandle and phandle
This replaces current set-chosen-cpu with a cleaner and faster
implementation which does not clobber the current node and stores
the chosen CPU phandle/ihandle.

This adds a helper to get the chosen CPU unit address.

This moves chosen cpu words to root.fs as otherwise it is quite hard
to maintain dependencies.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
2017-11-03 14:53:14 +11:00
Alexey Kardashevskiy b722179cf6 Revert various SLOF-to-QEMU private hypercalls
This reverts commits:

604d28cc3 "board-qemu: add private hcall to inform host on "phandle" update"
089fc18a9 "libhvcall: drop unused KVMPPC_H_REPORT_MC_ERR and KVMPPC_H_NMI_MCE defines"
1c17c13a5 "rtas: Improve error handling in instantiate-rtas"
f9a60de30 "Add private HCALL to inform updated RTAS base and entry"

A bigger hammer is coming soon which will pass the entire device
tree to QEMU, not just some random bits.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Greg Kurz <groug@kaod.org>
2017-10-24 12:19:26 +11:00
Laurent Vivier ea31295cf3 Use input-device and output-device
QEMU can now set environment variables from the command line (with -prom-env).
By this means, we can set the output-device and input-device variables,
and SLOF can read it and set stdout and stdin accordingly.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2017-10-04 17:49:04 +11:00
Nikunj A Dadhania 685af54d8a virtio-net: rework the driver to support multiple open
Found that virtio-net is using a around 200K receive buffer per device, if we
connect more than 40 virtio-net devices the heap(8MB) gets over. Because of
which allocation starts failing and the VM does not boot.

Moreover, the driver did not support opening multiple device, which is possible
using the OF client interface. As it was using globals to store the state
information of the driver.

Now the driver allocates a virtio_net structure during device open stage and
fills in the state information. This details are used during various device
functions and finally for cleaning up on close operation.

Now as the buffer memory is allocated during open and freed during the close
operations the heap usage is contained.

Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2017-08-07 18:24:58 +10:00
Greg Kurz 604d28cc3f board-qemu: add private hcall to inform host on "phandle" update
The "interrupt-map" property in each PHB node references the "phandle"
property of the "interrupt-controller" node. This is used by the guest
OS to setup IRQs for any PCI device plugged into the PHB. QEMU sets this
property to an arbitrary value in the flattened DT passed to SLOF.

Since commit 82954d4c10, SLOF has some generic code to convert all
references to any "phandle" property to a SLOF specific value.

This is is perfectly okay for coldplug devices, since the guest OS only
sees the converted value in "interrupt-map". It is a problem though for
hotplug devices. Since they don't go through SLOF, the guest OS receives
the arbitrary value set by QEMU and fails to setup IRQs.

In order to support PHB hotplug, this patch introduces a new private
hcall, which allows SLOF to tell QEMU that a "phandle" was converted
from an old value to a new value.

Suggested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2017-07-25 15:10:03 +10:00
Thomas Huth c4dd5b62f0 pci: Avoid 32-bit prefetchable memory area if possible
PCI bridges can only have one prefetchable memory area. If we are
already using 64-bit prefetchable memory regions, we can not use
a dedicated 32-bit prefetchable memory region anymore. In that
case the 32-bit BARs should all be located in the 32-bit non-
prefetchable memory space instead.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2017-07-21 15:03:29 +10:00
Thomas Huth 1199592fb4 Define 'open' and 'close' words of the /aliases nodes right from the start
It's much easier to do this when we create the node instead of
looking up the device node again later in each of the boards.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2017-07-20 15:36:25 +10:00
Thomas Huth f72a37713f virtio-scsi: Allow LUNs bigger than 255
The virtio-scsi device expects LUNs according to a "Single level LUN
structure" as defined in the "SCSI Architecture Model" specification.
SLOF currently only uses the "Single level LUN structure using
peripheral device addressing method" which provides the possibility
to specify up to 256 different LUNs.
To be able to use LUNs greater than 255, the "Single level LUN structure
using flat space addressing method" has to be used instead. This can
be done by setting the top-most bits to "01" instead of "00" in the first
byte of the two LUN bytes.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1431584
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2017-07-19 13:25:33 +10:00
Greg Kurz c39657a5f7 board_qemu: move code out of fdt-fix-node-phandle
This patch moves the code that actually alter the device tree to a
separate word, for improved readability. While here, it also makes
the comment of fdt-replace-all-phandles more accurate.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2017-07-17 13:51:57 +10:00
Greg Kurz 4c345ef71e board_qemu: drop unused values early in fdt-fix-node-phandle
These two values are pushed on the stack by decode-int and stay unused
until the 2drop line. Let's drop them right away to make it obvious.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2017-07-17 13:51:57 +10:00
Thomas Huth ed256fbdc5 pci: Improve the pci-var-out debug function
Print all related variables, using the code from phb-parse-ranges in
board-qemu/slof/pci-phb.fs, so that we can easily check all the values
from the SLOF prompt, too.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2017-07-17 13:19:06 +10:00