2017-06-11

Let’s Kill Write-Output

Intro

I think it is time we put down our old friend Write-Output for good. That’s right, Write-Output and not Write-Host. You should definitely not be using Write-Host outside of functions beginning with the Show- verb in console driven PowerShell apps, but I’m not here to kill off Write-Host. No, my target is Write-Output.

What is this craziness I’m speaking? Well, if I went back in time to 10 months ago and told myself not to use Write-Output, I would think my future self had gone mad. In fact, my journey towards murdering Write-Output began around that time.  There was a post in the /r/PowerShell  (sadly, I can’t find it in my history) where I was “correcting” someone for not using Write-Output. This prompted another user to link me to this discussion. At first I rejected the notion, but over time I came to embrace it. And today, I think I was crazy for not killing it off sooner.

What I’m saying here wont be new or revolutionary. My only hope with this post is to expose others to the issues with Write-Output and contribute to its demise.

What’s my beef with Write-Output? There are three concerns I have about using Write-Output: Performance, “Security”, and a “False Sense of Security”.

Before you begin, please understand that this is not an introduction-level PowerShell topic. This post was written with experienced PowerShell users and those who provide guidance and training to novice PowerShell users in mind. Also, this post is not specifically about console output, but about the use of the Write-Output command in general and its common use for “putting objects in the pipeline” (e.g. to “return” objects from a function). Most of it applies whether the objects ultimately go to the console or not. The only section dealing with console output is the Text Output section. Even then, the output for the targeted scenarios is more often a log file than it is a console.


Write-Output Is Slow

The first reason I have buried Write-Output in an undisclosed location is that it has performance issues. It’s slow compared it putting objects directly in the output stream. The difference is measurable at large iterations. I had a process for analyzing the permissions of every item in our OneDrive for Business sites and the run time went from several days to several hours by only removing all calls to Write-Output. There was already heavy parallelization, but with that many objects and with a bunch of helper functions acting on them, there were a large number of Write-Output calls being made per item times the billions of items it had to analyze.

How can something that is simply putting objects in the pipeline be so slow? First, it’s not that slow. Compared to the difference of the [PSCustomObject] type accelerator vs. using New-Object and Add-Member, which is orders of magnitude slower, putting objects directly in the pipeline vs. using Write-Output is only marginally slower. But, there are a few reasons.

First, there is a cost associated with the setup and teardown required for the invocation. If you look at the source code for Write-Output, its not all that complicated. It is essentially a wrapper for the Cmdlet.WriteObject() method. But each invocation does incur a cost to do that wrapping.

Second, if you are coding properly and making functions that Do One Thing, you will end up with a bunch of calls being made to Write-Output if you are using it to put objects in the pipeline. You might have your control script calling module functions calling private functions multiple times over and then have that in a loop repeating 100’s or 1000’s of times. That difference in speed adds up quickly.

Finally, it’s completely unnecessary bloat! Write-Output is wrapping Cmdlet.WriteObject(). But,  Cmdlet.WriteObject() is already called by default everywhere. So it is just inserting a call that will later be made again anyway. I might as well drive to the store and back before driving to the store if I want to call using Write-Output efficient.

But, don’t take my word for it. You can test this yourself.

'Write-Output Method: {0}' -f (
    Measure-Command {
        1..1000 | ForEach-Object {
            $Person = [PSCustomObject]@{
                FirstName = 'Bob'
                LastName  = 'Testerton'   
                Age       = 38
            }
            Write-Output $Person
        }   
    } | Select-Object -ExpandProperty TotalMilliseconds
)

'Declared and Passed method: {0}' -f (
    Measure-Command {
        1..1000 | ForEach-Object {
            $Person = [PSCustomObject]@{
                FirstName = 'Bob'
                LastName  = 'Testerton'   
                Age       = 38
            }
            $Person
        }   
    } | Select-Object -ExpandProperty TotalMilliseconds
)

'Straight to the pipeline method: {0}' -f (
    Measure-Command {
        1..1000 | ForEach-Object {
            [PSCustomObject]@{
                FirstName = 'Bob'
                LastName  = 'Testerton'   
                Age = 38
            }
        }   
    } | Select-Object -ExpandProperty TotalMilliseconds
)

The above code shows three methods for creating a [PSCustomObject] and putting it in the pipeline 1000 times. On my computer I get the following results:

Write-Output Method: 164.9981
Declared and Passed method: 75.5935
Straight to the pipeline method: 50.7954

Using Write-Output is 3 times slower than directly creating the object and sending to the pipeline. Even using the “Declared and Passed” method is much faster. This is a simple test, But you can see how it adds up the more logic, code separation, and iterations you use. In a simple script, it might not be a huge hit, but in something like the OneDrive process I mentioned at the beginning of this section it can have a profound impact.


“Security” and Stability Issues

The next reason that Write-Output  has checked in to the Motel Deep 6 is “security” and stability. This is actually two reasons bound by the a common issue. The core issue is that Write-Output is not a protected keyword in PowerShell. This means that command can be overwritten by a function or alias with the same name that has a higher Command Precedence.

“Security”

The ability to wrap, clobber, and/or proxy Write-Output is a double-edged sword. If you always use Write-Output to put objects in the pipeline you can do some really neat logging, telemetry, and debugging tricks by wrapping the command. Here is an example:

Function Write-Output {
    [CmdletBinding(
        HelpUri = 'https://go.microsoft.com/fwlink/?LinkID=113427',
        RemotingCapability = 'None'
    )]
    param(
        [Parameter(
            Mandatory = $true,
            Position = 0,
            ValueFromPipeline = $true,
            ValueFromRemainingArguments = $true
        )]
        [AllowNull()]
        [AllowEmptyCollection()]
        [psobject[]]
        ${InputObject},

        [switch]
        ${NoEnumerate})

    begin {
        try {
            $outBuffer = $null
            if ($PSBoundParameters.TryGetValue('OutBuffer', [ref]$outBuffer)) {
                $PSBoundParameters['OutBuffer'] = 1
            }
            $wrappedCmd = $ExecutionContext.InvokeCommand.GetCommand(
                'Microsoft.PowerShell.Utility\Write-Output',
                [System.Management.Automation.CommandTypes]::Cmdlet
            )
            $scriptCmd = {& $wrappedCmd @PSBoundParameters }
            $steppablePipeline = $scriptCmd.GetSteppablePipeline($myInvocation.CommandOrigin)
            $steppablePipeline.Begin($PSCmdlet)
        } catch {
            throw
        }
    }

    process {
        $TimeStamp = Get-Date -Format o | foreach {$_ -replace ":", "."}
        $LogFile = '{0}\{1}-object.xml' -f $ENV:LOCALAPPDATA, $TimeStamp
        $LogObject = [PSCustomObject]@{
            Date      = Get-Date
            Object    = ${InputObject}
            CallStack = Get-PSCallStack
            PSCmdlet  = $PSCmdlet
        }
        $LogObject | Export-Clixml -Path $LogFile
        try {
            $steppablePipeline.Process($_)
        } catch {
            throw
        }
    }

    end {
        try {
            $steppablePipeline.End()
        } catch {
            throw
        }
    }
<#

    .ForwardHelpTargetName Microsoft.PowerShell.Utility\Write-Output
    .ForwardHelpCategory Cmdlet

#>
}

This will act exactly like Write-Output except that, in addition to putting the object in the pipeline, it will also logs the date, time, object contents, call stack, and $PSCmdlet to an xml file. This is awesome to see what all is going on where when and why… but it’s not just awesome from a developer and telemetry perspective. The other edge of the sword is data exfiltration. The Export-Clixml could just as easily be a REST API call dumping a JSON format of the same data to an attackers remote endpoint.

The reason why this is a “security” (with quotes) issue and not a security (without quotes) issue is that if an attacker has the ability to do this, you probably have bigger concerns than data exfiltration via PowerShell. It’s also not the only way to accomplish this goal. However, with the prolific use of Write-Output and the fact that just about anything with the same name will have a higher command precedence makes Write-Output a soft, easy, and dependable target. I’m not trying to insinuate that this is a “stop the presses” level security threat, but it’s there to consider.

Stability

Since Write-Output can be clobbered, depending on it could lead to stability issues. It’s not just hackers you need to worry about. It is also misguided users and developers. Someone may read my post, see that example above, and think it would be great to implement this in their environment to see what is going on. Let’s assume I have no idea what I’m doing (it’s a safe assumption Winking smile ) and that code results in 1 out of 5 objects disappearing into the ether. If your module or code makes use of Write-Output the user has now introduced a situation where your code is suddenly buggy and not working.

This not to put blame on developers who use Write-Output. No, the blame goes completely to the user for being crazy enough to do this. But it’s something to consider. If you look at projects like Pester and Microsoft’s PowerShell modules, you will see that many of them include safeguards against this behavior by ensuring that commands like Write-Output are the correct ones. (This is probably a best practice that should be adopted en masse.) Most projects, however, don’t think twice about using core PowerShell commands. It’s also not just the silly users you need to worry about. Silly developers can very easily create a function in the global scope to hijack Write-Output on module load.

This is not a problem unique to Write-Output by any means. The problem comes with its prolific use without steps taken to ensure the right Write-Output is used. The end result, however, is that you might end up chasing ghosts caused by some yahoo thinking their Write-Output is better.

Summary of “Security” and Stability Issues

To wrap up this section I want to summarize the issue:

Because Write-Output is not a protected keyword it can be wrapped, proxied, and/or clobbered resulting in, at worst, an avenue for data exfiltration by a malicious actor and, at least, stability issues caused by a misguided user or developer. While neither issue alone is completely worrisome in a larger context, they do, however, lend weight to avoiding the use of Write-Output.


Write-Output Provides a “False Sense of Security”

The last reason Write-Output is wearing brand new cement shoes while sleeping with the fishes is that it provides a “False Sense of Security” for knowing where output is generated.

One principle of programming that is good to code by is “explicit is better than implicit”. This axiom is ingrained in many of us and for good reasons. Implicit things tend to become broken unknowable things over time. Keeping activities in your code explicit helps maintain the story and flow inside the code so that yourself and others can come back to it later and know what’s going on.

This principle, however, is at odds with my “codecidal” tirade against Write-Output. Leaving the command out of the picture makes code more “implicit” instead of “explicit”. One phrase that permeates the PowerShell community is “implicit Write-Output”. It’s used to describe the following:

$ProcessNames = foreach ($Process in (Get-Process)) {
    $Process.Name
}

It would be said that $Process.Name is “being sent to the pipeline with an implicit Write-Output”. The implication is that the code actually running is this:

$ProcessNames = foreach ($Process in (Get-Process)) {
    Write-Output $Process.name
}

However, that is not the case. In the first block, Write-Output is never called. If you recall earlier I pointed out that Write-Output is a wrapper for Cmdlet.WriteObject(). That is what is actually called, not Write-Output.

What is more important here is the principle or essence of “implicit Write-Output” rather than the actual inner workings. The reason the first code block is often regarded as improper is that the action of writing an object to the output stream is implied rather than explicitly stated. It is generally improper to use “implicit” code over “explicit” code, so these people are correct, right?

Wrong! In the same code you will find authors doing this without batting an eye:

$ProcessNames = 'powershell', 'mmc'
$RunningProcesses = foreach ($ProcessName in $ProcessNames) {
    Get-Process -Name $ProcessName
}

Results from Get-Process are being implicitly sent to the pipeline. In their world, to be consistent they would need to do something like this:

$ProcessNames = 'powershell', 'mmc'
$RunningProcesses = foreach ($ProcessName in $ProcessNames) {
    $Results = Get-Process -Name $ProcessName
    Write-Output $Results
}

So what does this have to do with a “false sense of security”? The core of what “implicit Write-Output” is addressing is that explicit use of Write-Output makes it clear the intent is to send objects to the output stream and make that searchable in the code. However, this principle is misguided within the context of PowerShell thus leading to a false sense of security as to intent and location of sending objects to the pipeline.

I’m not a CS major, so I don’t know all the proper terms for various programing concepts, so if I get this part wrong, please feel free to correct me. PowerShell has basically three types of statements: Definitions, Assignments, and “putting objects in the pipeline”. Definitions are statements like Function, Class, etc. They are used to define piece of code.

Function Get-Widegt {
    [CmdletBinding()]
    param ( )
    process {
        $Widget
    }
}

Assignments are statements where values are assigned to variables.

$a = 5

Everything else is “putting objects in the pipeline”. That’s right, EVERYTHING ELSE. It doesn't matter how you color it or name it, all other statements in PowerShell put objects in the pipeline. Those objects might be nonexistent ([Void], $Null, empty collections, etc) but they all get passed to Cmdlet.WriteObject() in the end.

Get-Process
1000
5 + 10
'This is a String'
'This is a {0}' -f $test
$Processes
Write-Output $Processes
$object.Method()
[PSCustomObject]@{Property = 'Value'}

All of these can be summed up as “do something and put it in the pipeline”. That makes all of these statements related in their result regardless of the actions they are performing.

The reason “implicit Write-Output” is misguided is that the implicit nature of putting objects in the pipeline is a baked in feature of PowerShell. It’s not wrong, it’s different. This trips up so many people, like myself, who came from languages where it was not only improper to have more than one statement that generates a return object, but sometimes completely impossible without compile errors. In many languages, the moment you do their equivalent of “put objects in the pipeline” the current context exits. That is not the case in PowerShell.

This is where the “False Sense of Security” comes from. Having Write-Output gives us that warm cozy feeling return often provides in other languages. However, no matter what we do, output could be put in the pipeline by any line that is not an assignment nor definition. If we notice code is producing more than one object, Write-Output will not help us track down the rogue code. Sure it might help us know where we think output should be coming from, but for all we know a Write-Output could be spitting out [Void] and all the real output is in the middle of some pipeline several lines up.  Or, an author might very well have intended for an edge case to return more than one object and the lack of Write-Output for the mystery object is no indication of their lack of intent. Also, Write-Output could be in the middle or beginning of a pipeline and putting objects into that pipeline and not actually returning objects from the code block.

This was really hard for me to accept. Everything I have learned in my coding experiences screams at me to be explicit. But with Write-Output it can actually do more harm than good in PowerShell. What I had to do was embrace PowerShell’s pipeline greedy nature and let go of the “explicit is better than implicit” principle in this case. It’s still a good principle for all others, but it is clearly not a panacea.


Ding, Dong, Write-Output’s Dead, Now What?

Hopefully I have done a good job of making my case for killing Write-Output so far and you are ready to help me cover up the crime. But, I didn’t want to say “Don’t do that!” and not explain some alternatives. I will provide some of my personal best practices below.

Creating and Returning Objects

This is how you might normally see an object created and returned in a loop:

# Old way
$Processes = Get-Process
$Results = foreach ($Process in $Processes) {
    $Result = [PSCustomObject]@{
        ProcessName = $Process.Name
        Date        = Get-Date
    }
    Write-Output $Result
}

This is how I approach it:

# Do this
$Processes = Get-Process
$Results = foreach ($Process in $Processes) {
    [PSCustomObject]@{
        ProcessName = $Process.Name
        Date        = Get-Date
    }
}

I don’t even bother with the assignment. One recommendation I see made often is to do this:

# Or do this
$Processes = Get-Process
$Results = foreach ($Process in $Processes) {
    $Output = [PSCustomObject]@{
        ProcessName = $Process.Name
        Date        = Get-Date
    }
    $Output
}

If you really want to mark your output, you can use the same variable name in every instance. I personally think this wasteful and still provides that “false sense of security”.

I would urge strongly against doing this:

# Don't do this
$Processes = Get-Process
$Results = foreach ($Process in $Processes) {
    # Yield
    [PSCustomObject]@{
        ProcessName = $Process.Name
        Date        = Get-Date
    }
}

Comments like yield, output, return, etc. are troublesome. Code can get moved around and the comment may not be moved with it and no one would know because moving or not moving comments doesn’t break or fix code. You might end up with a yield comment above a pipeline that returns nothing and waste time finding out why that line is not producing output when the real cause was that someone moved the output line down but not the comment.

Text Output

As mentioned before, you should only be using Write-Host in console based applications in Show- functions. But, sometimes you still want text on the screen. A good example of this is psake builds on AppVeyor. Write-Output is often recommended over Write-Host because it allows you to mix strings and formatted objects without experiencing the race conditions produced by the pipeline vs. console dilemma.

Old way:

# Old Way
Write-Output "PowerShell Processes:"
Get-Process -Name PowerShell

My way:

# Do This
"PowerShell Processes:"
Get-Process -Name PowerShell

Just put the string by itself. This ends up being really handy for things like this:

$ProcessNames = 'PowerShell', 'mmc'
foreach ($ProcessName in $ProcessNames) {
    "{0} Processes:" -f $ProcessName
    Get-Process -Name PowerShell
    ' '
}

I can just do the format operation on the same line that I’m sending it to the output stream without having to wrap it in parenthesis or using subexpressions (in the case where I need an object property in the string).

You can do the same with here-strings for big blocks of text:

@'
This is a Large Block of text.

It has many lines.

This is so much easier to read.

Here are my running processes:
'@
Get-Process

It’s flexible and achieves all the same goals that Write-Output did for text-to-screen operations.

Replacing –NoEnumerate

One function of Write-Output is the –NoEnumerate parameter. This allows you pass an array as a single object instead of processing each array item as an individual object.

# Example Usage
$Array = @(1, 2, 3)
'Normal: '
Write-Output $Array | Measure-Object | Select-Object -ExpandProperty Count
'NoEnumerate: '
Write-Output $Array -NoEnumerate | Measure-Object | Select-Object -ExpandProperty Count

Result:

Normal:
3
NoEnumerate:
1

You can see that –NoEnumerate makes the array of three items appear as one object instead of three.

This can be accomplished unary comma operator:

# Do This
$Array = @(1, 2, 3)
'Normal: '
$Array | Measure-Object | Select-Object -ExpandProperty Count
'Unary Comma: '
,$Array | Measure-Object | Select-Object -ExpandProperty Count

Results:

Normal:
3
Unary Comma:
1



Conclusion

I hope we can now put this gruesome task of killing Write-Output behind us and move on to better things. I have already started removing it from all of my work. I no longer use it in examples when offering advice or answering questions. I think the community should stop teaching it altogether. It has some use-cases I can think of, but they are somewhat edge cases and not really worth discussing, in my opinion.

Ultimately, once you embrace PowerShell’s paradigm of being greedy with objects in the pipeline, Write-Output becomes superfluous. It looks like a real eye sore and triggers bad memories of tracking down rogue output and being lead astray by this seductive code siren. Once you realize that the majority of statements are “put objects in the pipeline” operations, you really start to question why you would ever use Write-Output in the first place.

R.I.P In Peace, Write-Output.
F

Join the conversation on /r/PowerShell!

2 comments:

lancestephens said...

Hey thanks for the words of advice. I'm still a neophyte at PowerShell, but your snippets really helped illustrate best practices.

Best,
Lance

Mark Kraus said...

Awesome! I'm glad you found it useful. For more best practices, be sure to check out https://github.com/PoshCode/PowerShellPracticeAndStyle